Skip to content

Conversation

@irene-sheen-reef
Copy link

Previously, urllib.parse.urlsplit was being used to parse the URL. This breaks the filename/path across path, query and fragment introducing various cases where the filename changes from the user-intended filename like:

  • files starting with /
  • files ending with ? or #

Replacing it with a simpler parser which only returns (scheme, netloc, path) solves these issues.

@olzhasar-reef
Copy link

The approach you've chosen overall does the job.
It seems to closely mimic the behavior of urllib.parse.urlsplit (in a simplified manner as you've pointed out). This might be a good thing in some cases, where a generalized solution preserving the original behavior is actually needed. However, I am not sure if we have such case here.
As we essentially need to support only urls of types: b2id:// and b2://bucket/file, is there a simpler way to achieve that with less abstractions and code-induced liability?

@irene-sheen-reef
Copy link
Author

@olzhasar-reef I replaced the SplitB2Result with a simple namedtuple which should be sufficient. I considered hardcoding the schemes into the regex but that would produce less useful error messages (invalid B2 URI instead of Unsupported URI scheme) so I would prefer to leave that as is.

@olzhasar-reef
Copy link

@irene-sheen-reef
My point is that this whole url split and unsplit flow feels like an unnecessary overkill for our use case.
As scheme, netloc and path are just intermediary tools, we might as well just parse exactly what's needed straight up.

E.g.:

b2id_pattern = re.compile(r'^b2id://(?P[a-zA-Z0-9:_-]+)$')
b2_pattern = re.compile(r'^b2://(?P[a-z0-9-]+)/(?P.+)$')

And pick the appropriate one based on the prefix.

Are there any downsides compared to the approach you've suggested?

@olzhasar-reef olzhasar-reef merged commit 5d5b5d3 into reef-technologies:master Jul 30, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants